224 research outputs found
Online learning with graph-structured feedback against adaptive adversaries
We derive upper and lower bounds for the policy regret of -round online
learning problems with graph-structured feedback, where the adversary is
nonoblivious but assumed to have a bounded memory. We obtain upper bounds of
and for strongly-observable and
weakly-observable graphs, respectively, based on analyzing a variant of the
Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we
show that a matching lower bound of is achieved in
the case of full-information feedback. We also study the particular loss
structure of an oblivious adversary with switching costs, and show that in such
a setting, non-revealing strongly-observable feedback graphs achieve a lower
bound of , as well.Comment: This paper has been accepted to ISIT 201
On the Neural Tangent Kernel of Equilibrium Models
This work studies the neural tangent kernel (NTK) of the deep equilibrium
(DEQ) model, a practical ``infinite-depth'' architecture which directly
computes the infinite-depth limit of a weight-tied network via root-finding.
Even though the NTK of a fully-connected neural network can be stochastic if
its width and depth both tend to infinity simultaneously, we show that
contrarily a DEQ model still enjoys a deterministic NTK despite its width and
depth going to infinity at the same time under mild conditions. Moreover, this
deterministic NTK can be found efficiently via root-finding
Leveraging Multiple Descriptive Features for Robust Few-shot Image Learning
Modern image classification is based upon directly predicting model classes
via large discriminative networks, making it difficult to assess the intuitive
visual ``features'' that may constitute a classification decision. At the same
time, recent works in joint visual language models such as CLIP provide ways to
specify natural language descriptions of image classes but typically focus on
providing single descriptions for each class. In this work, we demonstrate that
an alternative approach, arguably more akin to our understanding of multiple
``visual features'' per class, can also provide compelling performance in the
robust few-shot learning setting. In particular, we automatically enumerate
multiple visual descriptions of each class -- via a large language model (LLM)
-- then use a vision-image model to translate these descriptions to a set of
multiple visual features of each image; we finally use sparse logistic
regression to select a relevant subset of these features to classify each
image. This both provides an ``intuitive'' set of relevant features for each
class, and in the few-shot learning setting, outperforms standard approaches
such as linear probing. When combined with finetuning, we also show that the
method is able to outperform existing state-of-the-art finetuning approaches on
both in-distribution and out-of-distribution performance
Monotone deep Boltzmann machines
Deep Boltzmann machines (DBMs), one of the first ``deep'' learning methods
ever studied, are multi-layered probabilistic models governed by a pairwise
energy function that describes the likelihood of all variables/nodes in the
network. In practice, DBMs are often constrained, i.e., via the
\emph{restricted} Boltzmann machine (RBM) architecture (which does not permit
intra-layer connections), in order to allow for more efficient inference. In
this work, we revisit the generic DBM approach, and ask the question: are there
other possible restrictions to their design that would enable efficient
(approximate) inference? In particular, we develop a new class of restricted
model, the monotone DBM, which allows for arbitrary self-connection in each
layer, but restricts the \emph{weights} in a manner that guarantees the
existence and global uniqueness of a mean-field fixed point. To do this, we
leverage tools from the recently-proposed monotone Deep Equilibrium model and
show that a particular choice of activation results in a fixed-point iteration
that gives a variational mean-field solution. While this approach is still
largely conceptual, it is the first architecture that allows for efficient
approximate inference in fully-general weight structures for DBMs. We apply
this approach to simple deep convolutional Boltzmann architectures and
demonstrate that it allows for tasks such as the joint completion and
classification of images, within a single deep probabilistic setting, while
avoiding the pitfalls of mean-field inference in traditional RBMs
- …